Hierarchical structure and word strength predication of Mandarin prosody
نویسندگان
چکیده
We use Stem-ML to build an automatic learning system for Mandarin prosody that allows us to make quantitative measurements of prosodic strengths. Stem-ML is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds. Because Stem-ML describes the interactions between nearby tones or accents, we were able to use a highly constrained model with only one accent template for each lexical tone category, and a single prosodic strength per word. The model accurately reproduces the intonation of the speaker, capturing 87% of the variance of f0. The result reveals strong alternating metrical patterns in words, and shows that the speaker uses word strength to mark a hierarchy of boundaries.
منابع مشابه
Hierarchical Structure and Word Strength Prediction of Mandarin Prosody
We use Stem-ML to build an automatic learning system for Mandarin prosody that allows us to make quantitative measurements of prosodic strengths. Stem-ML is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds. Because Stem-ML describes the interactions between nearby tones or accents, we were able to use a highly constrained model wi...
متن کاملA Statistical Model with Hierarchical Structure for Predicting Prosody in a Mandarin Text-to-speech System
In this paper we proposed a statistical prosody model with hierarchical structure for Mandarin Text-to-Speech (TTS) system. There are four levels in our model: syllable level, word level, breath group (prosodic phrase) level, and utterance level. Here “hierarchy” means that each lower level is a subset of a higher level. The prosodic information is first found in each level, and then they are c...
متن کاملImproving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach
Hierarchical prosody structure generation is an important but challenging component for speech synthesis systems. In this paper, we investigate the use of enhanced embedding (joint learning of character and word embedding (CWE)) features and different model fusion approaches at both character and word level for Mandarin prosodic boundaries prediction. For CWE module, the internal structures of ...
متن کاملMandarin Text-to-speech Synthesis
This chapter introduces Mandarin Text-To-Speech (MTTS) synthesis. Beginning with a brief review on the development history of MTTS and attributes of MTTS, three main constituents of the technology are presented: 1) Text processing: word segmentation, disambiguation of polyphones, and analysis of rhythm structure; 2) prosodic processing: features of Mandarin prosody, and prosody prediction, and;...
متن کاملRelative Importance of Tone and Segments for the Intelligibility of Mandarin and Cantonese
This study aims to establish the relative importance of segmental and word-prosodic properties for the intelligibility of spoken Mandarin and Cantonese. Mandarin has a relative small inventory of lexical tones (four) while Cantonese has a richer tone inventory (at least seven). Word prosody is normally redundant relative to segmental properties so that word recognition does not crucially depend...
متن کامل